Skip to content

Conversation

pjohnst5
Copy link
Contributor

@pjohnst5 pjohnst5 commented Jul 14, 2025

CNS IBDevice API Contracts

Overview

Two new APIs for managing InfiniBand (IB) devices in Azure Container Network Service (CNS):

  1. POST IB Devices for Pod - POST IB devices for a pod
  2. GET IB Device Info - GET IB device status

API 1: POST IB Devices for Pod

Endpoint

POST /ibdevices

Request Body

{
  "ibmacaddresses": [
    "60:45:bd:a4:b5:7a",
    "7c:1e:52:07:11:36"
  ],
  "podNamespace": "default",
  "podName": "example-pod"
}

Success Response

{
  "message": "Successfully assigned 2 devices to pod my-pod-my-namespace"
}

Error Response

HTTP 400

{
  "message": "Device 60:45:bd:a4:b5:7a is already assigned to another pod"
}

API 2: GET IB Device Information

Endpoint

GET /ibdevices?ibmac=60:45:bd:a4:b5:7a

Request

No request body (MAC address provided as query param)

Success Response

{
  "ibMACAddress": "60:45:bd:a4:b5:7a",
  "podNamespace": "mynamespace",
  "podName": "mypod", 
  "status": "programmingComplete",
  "message": ""
}

Device Not Found Response

HTTP 404

{
  "ibMACAddress": "60:45:bd:a4:b5:7a",
  "podNamespace": "",
  "podName": "", 
  "message": "Device not found"
}

See status.go in this PR for list of all statuses of an IB device


Request body

See api.go for the structs

  • AssignIBDevicesToPodRequest
  • AssignIBDevicesToPodResponse
  • GetIBDeviceInfoResponse

@pjohnst5 pjohnst5 force-pushed the pjohnst5/cns-api-contracts branch 2 times, most recently from 71fa67a to 0564b6d Compare July 15, 2025 00:25
@pjohnst5 pjohnst5 force-pushed the pjohnst5/cns-api-contracts branch 2 times, most recently from 7b6d80b to 82b11b0 Compare July 23, 2025 16:47
@pjohnst5 pjohnst5 force-pushed the pjohnst5/cns-api-contracts branch from c9a2fde to 37cf8f4 Compare July 30, 2025 16:16
@pjohnst5 pjohnst5 marked this pull request as ready for review July 30, 2025 16:18
@Copilot Copilot AI review requested due to automatic review settings July 30, 2025 16:18
@pjohnst5 pjohnst5 requested a review from a team as a code owner July 30, 2025 16:18
@pjohnst5 pjohnst5 requested a review from neaggarwMS July 30, 2025 16:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces two new REST API endpoints for managing InfiniBand (IB) devices in Azure Container Network Service (CNS), enabling NUMA-aware pod device assignment.

  • Adds PUT endpoint to assign specific IB devices to pods via MAC addresses
  • Adds GET endpoint to query the status and assignment of individual IB devices
  • Defines error codes and device status enums for the IB device management system

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
cns/types/infiniband/status.go Defines device status constants for IB device lifecycle states
cns/types/infiniband/errorcodes.go Defines error codes for IB device operations
cns/swagger.yaml Adds OpenAPI specification for the two new IB device endpoints
cns/api.go Defines Go structs for API request/response contracts
Comments suppressed due to low confidence (1)

cns/swagger.yaml:186

  • The schema name 'GetIBDeviceStatusResponse' is inconsistent with the endpoint naming convention. Based on the API path '/ibdevices/{macaddress}', it should be 'GetIBDeviceInfoResponse' to match the PR description and maintain consistency.
                $ref: "#/components/schemas/GetIBDeviceStatusResponse"

@pjohnst5 pjohnst5 requested review from a team as code owners July 30, 2025 16:20
@pjohnst5 pjohnst5 requested a review from QxBytes July 30, 2025 16:20
@pjohnst5
Copy link
Contributor Author

Waiting for #3876 to merge first before merging this

@pjohnst5 pjohnst5 force-pushed the pjohnst5/cns-api-contracts branch 2 times, most recently from dd872cf to ac97074 Compare July 30, 2025 20:18
@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 1, 2025

Alright @rbtr , I've made some changes, I just have 2 remaining questions:

  • Why do you want the assign action specifically in the POST API?
    • I think it may be because POST alone (good rapper name?) with no action, implies you're creating a new resource (which isn't the case here)
  • Why did you ask if a pod being created with the same name right after one was deleted would be a problem?
    • I've discovered that is a problem, but now want to know what you had in mind because we were talking about the podID
    • Were you thinking we need to pass more identification info such as InfraContainerID ?

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 4, 2025

Hi @rbtr , lmk what you think now with the latest changes, thanks

@rbtr
Copy link
Collaborator

rbtr commented Aug 4, 2025

Alright @rbtr , I've made some changes, I just have 2 remaining questions:

  • Why do you want the assign action specifically in the POST API?

    • I think it may be because POST alone (good rapper name?) with no action, implies you're creating a new resource (which isn't the case here)

Exactly, if you don't have an action in the URI, the implicit contract is that POSTing to /ibdevices/<id> is a create or update of the whole object.
Thinking about it from the other direction, if POSTing /ibdevices/<id> assigns a mac to a Pod...how would you create the ibdevice entity in the first place?

  • Why did you ask if a pod being created with the same name right after one was deleted would be a problem?

    • I've discovered that is a problem, but now want to know what you had in mind because we were talking about the podID
    • Were you thinking we need to pass more identification info such as InfraContainerID ?

There's maybe a race here, since namespaced name is not unique. A pod could get a stale MTPNC?

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 4, 2025

Thinking about it from the other direction, if POSTing /ibdevices/ assigns a mac to a Pod...how would you create the ibdevice entity in the first place?

Yeah that's a good point, the ibdevices are provisioned at the Azure host level
So while CNS can't really create one, are you saying you'd like me to have the POSTing of /ibdevices/<id> "register" a device with CNS?

There's maybe a race here, since namespaced name is not unique. A pod could get a stale MTPNC?

Yes you're right, we could have a race condition met here

Lemme see what we do today (cuz this already happens)

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 4, 2025

Alright @rbtr , I ran some tests, and something that changes even when pods get the same names, is UID:
image

We could make this a required parameter for the POST API

  • CNS could then add it to the MTPNC on creation
    • CNS will be creating the MTPNC in this scenario btw
  • CNS could then check the MTPNC on each POST, and make sure the UID matches
    • If it matches, CNS knows the MTPNC is current
    • If it doesn't match, we know the MTPNC is stale, and CNS could return an error

Wdyt?

@rbtr
Copy link
Collaborator

rbtr commented Aug 5, 2025

who creates the MTPNC? why don't we set an ownerref to the Pod so it gets GCed?

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 5, 2025

who creates the MTPNC?

Today, DNC-RC creates the MTPNC, however, in this new NUMA scenario, CNS will create the MTPNC instead

why don't we set an ownerref to the Pod so it gets GCed

For this new NUMA scenario, we could do that

Why haven't we done that in the past with DNC-RC making the MTPNC?

I don't know

Basically @rbtr , is there anything else you want different about this PR, besides adding an ownerref to the pod so it is GCed? (that anyways will come in a future PR when I actually implement the APIs, this PR is to establish the contract between the partner team and us)

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 5, 2025

Hey wait we may actually set an ownerref already (I probably just missed it), here is one example MTPNC that DNC-RC already creates today:

apiVersion: multitenancy.acn.azure.com/v1alpha1
kind: MultitenantPodNetworkConfig
metadata:
  creationTimestamp: "2025-05-29T17:51:35Z"
  finalizers:
  - finalizers.acn.azure.com/dnc-operations
  generation: 1
  name: ibpod1-deployment-5dbf8f965-sdgzr
  namespace: e2e-ns-d0s9t9ibfg06a1lnpsog
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Pod
    name: ibpod1-deployment-5dbf8f965-sdgzr
    uid: 18a8b6b2-b247-4ae6-a351-5d56a203eab4
  resourceVersion: "14083"
  uid: 817a229e-24cd-4efc-b37c-75d4ff48cb5f

Looking more in the code to see where we do that

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 5, 2025

Yeah I found where it is added, I sent it to you privately

@pjohnst5
Copy link
Contributor Author

pjohnst5 commented Aug 5, 2025

So, we know there's an owner ref on the MTPNC, of the Pod

Are you saying, we should also add, an ownerref on the pod, of the MTPNC?

Copy link

This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Aug 20, 2025
@pjohnst5 pjohnst5 force-pushed the pjohnst5/cns-api-contracts branch from 1539d9a to a045606 Compare August 20, 2025 22:58
@pjohnst5
Copy link
Contributor Author

Hi @rbtr , Kshitija and I were talking and on the POST API, if the pod has 7 or 8 macs, the URL will get really long.. 😅
POST /ibdevices:assign?ibmacs=e6:ed:09:b1:97:7e,ea:2a:3f:19:fd:49,e9:b1:8b:f9:d1:e0,00:7b:34:84:c5:0f,5f:26:ba:f7:2b:41,07:81:05:db:2a:36,a8:e9:a1:50:3b:2a&podnamespace=mynamespace&podname=mypod

Can I move them into arequest body instead? 😅
POST /ibdevices:assign

{
"MACs": ["e6:ed:09:b1:97:7e",
"ea:2a:3f:19:fd:49",
"e9:b1:8b:f9:d1:e0",
"00:7b:34:84:c5:0f",
"5f:26:ba:f7:2b:41",
"07:81:05:db:2a:36",
"a8:e9:a1:50:3b:2a"],
"podnamespace": "mynamespace"
"podname": "mypod"
}

@github-actions github-actions bot removed the stale Stale due to inactivity. label Aug 21, 2025
@pjohnst5 pjohnst5 force-pushed the pjohnst5/cns-api-contracts branch from 5287c32 to ad00d82 Compare August 25, 2025 17:36
@pjohnst5
Copy link
Contributor Author

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@pjohnst5 pjohnst5 added this pull request to the merge queue Aug 25, 2025
Merged via the queue into master with commit ac6cb6a Aug 25, 2025
16 checks passed
@pjohnst5 pjohnst5 deleted the pjohnst5/cns-api-contracts branch August 25, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants